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ABSTRACT 

One-io-one tutoring is more effective than alternative training methods, yet there have been few attempts to 
examine the process of naturalistic tutoring. This projcci explored dialogue patterns in two corpora: 
graduate students tutoring undergraduates in research methods, and high school students tuioring7th 
graders in algebra. We analyzed pedagogical strategies, feedback mechanisms, question askings question 
answering, and pragmatic assumptions during the tutoring process. One pervasive dialogue pattern was a 
five-step frame: (1) tutor asks question, (2) student answers question, (3) tutor gives short feedback on 
answer quality, (4) tutor and student collaboratively improve on answer quality, and (5) tutor assesses the 
student's understanding of the answer. Tutor quesiions were primarily motivated by curriculum scripts 
and the process of coaching students through exemplar problems - rarely by attempts to diagnose and 
remediate the student's idiosyncratic knowledge deficits. 

Dialogue patterns were simulated by two computational models: a recurrent connectionist network and a 
recursive transition network. These models capture the systematic! ty in the sequential ordering of speech 
act categories. T^iat is, to what extent does a model accurately predict the category of speech act N+1, 
given speech acts 1 through N? 
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II is <vell documented ihal one-lo-one luioring is a belter meihod of iraining sludents ihaii normal 
pedagogical slraiegies in classroom sellings. The elTeci size of ihe advantage of luioring over classrooms 
has ranged from .4 lo 2,3 standard deviation units (Bloom, 1984; Cohen, Kulik, & Kulik, 1982; Mohan, 
1972). Howewn it is difficult lo detemine the cause of this advantage until there is a better understanding 
of the tutoring process. 

Unfortunately, only a handful of studies have systematically examined the process of luioring at a fine- 
grained level (Fox, 1992; Graesser, 1992, 1993; Gracssor & Person, in press: Leinhardl. 1987: 
McArthur, Stasz, & Zmuidzinas. 1990; Miyake & N'oman, 1979; Putnam, 1987; van Lchn, 1990). Ii 
takes a great deal of time and effort to perform an in-depth qualitative analysis of tuioiial interaction. 
Consequeatly, some of the observations and results reported by these i-esearchers may have limited 
generality. Because of limited sample sizes in qualitative prtx:ess-oriented studies, them have been few- 
attempts 10 relate components of the tutorial process to student achievement or to tutoring outcomes. In (he 
present project, we analyzed patterns of tutorial dialogue in a comparatively large sample of tutoring 
sessions. 

According to Cohen el al.'s (1982) meta-analysis of 52 tutoring studies, the impact of tutoring on learning 
is not significantly related to \ht amount of tutoring training that the tutors received. It is also not related to 
age differences between tutor and student. In some studies, the peers of the sludents do an excellent job 
serving as tutors for sludents having problems (Fantuzzo, King, & Heller, 1992; Mohan, 1972; Rogoff, 
1990). These outcomes are rather counterintuitive. Most of us would expect that luioring age and 
expertise would improve learning outcomes. One explanation of these results is that the iraining and 
expertise of tutors is normally minimal in naturalistic luioring session?. Most tutors in a school system are 
peers of the students, slightly older students, paraprofessionals, and adult volunteers rather than highly 
skilled tutors (Fitz-Gibbon, 1977). Perhaps a tutor needs extensive iraining on both the topic knowledge 
and tutoring strategies before tutoring expertise shows appreciable gains in learning outcomes. 
Nevertheless, the counierintuiiive finding does support one conclusion about the relationship between 
tutoring process and outcome: The reported facilitation of tutoring over classroom settings can be 
attributed to pen'asi ve dialogue patterns of normal tutors rather than lo special pedagogical strategies of 
highly trained tutors. 

Several hypotheses may explain the advantage of one-to-one luioring over classroom settings. According 
to an active inquiry hypothesis, sit: tents perhaps have more active control over their learning in tutoring 
sessions and therefore have a betttj- chance of correcting their own idiosyncratic knowledge deficits. 
Educational researchers have frequently advocated the construction of educational settings that promote 
active learning (Bransford, Arbitman-Smilh, Stein, & Vye, 1985; Brown, 1988; Nathan, Kintsch, & 
Young, 1992; Paperl, 1980; Scardamalia, Bereiter, McLean, Swallow, & Woodruff, 1989; Zimmerman, 
Bandura, & Martinez-Pons, 1992). Tutoring allegedly supplies such an environment. According to an 
error- remediation hypothesis, tutoring provides an opportunity for the tutor to diagnose and repair the 
idiosyncratic misconceptions and knowledge deficits of u particular student (Anderson & Reiser, 1985; 
Anderson, Conrad, & Corbett, 1989; van llehn, 1990). Teachers in classrooms have the time to focus on 
general problems of several students, but rarely the idiosyncratic problems of a particular student. 
According to an explanatory reasoning hypothesis, tutoring may expose patterns of reasoning and problem 
solving that a classroom setting cannot furnish because of time and resource limitations. Learning is 
facilitated to the extent that students construct explanations and justifications of the content in the material 
to be learned (Anderson et al., 1989; Chi, Bassok. Lewis, Reimann, &Glaser, 1989; Cobb, Wood, 
Yackel, & McNeal. 1992; Keiras, 1992; Moore & Ohlsson, 1992: Prcssley, Symons, McDaniel, Snyder, 
& Tumure, 1988; Reiser, Kimberg, Loveit, & Ranney, 1991), There no doubt are additional hypotheses 
that account for the advantages of tutoring over classroom settings. The analyses in this project naiTowotJ 
down the set of plausible hypotheses* 

Ideal tutoring strategies have been proposed by researchers investigating the cognitive foundaiions of 
complex learning and by developers of intelligent tutoring systems (Bransford, Goldman, & Vye, 1991 ; 
Lesgold. 1992; Ohlsson, 1986; Scardamalia et aL« 1989: Sleeman & Brown. 1982). These researchers 
have idenlified pedagogical techniques that the tutor can implement during tutoring, such as the Socralic 
method (Collins, 1985), inquiry teaching (Collins, 1988), diagnosis-remediation (Anderson & Reiser, 
1985; van Lchn, 1990), the reciprocal training melhtxl (Palincsar & Brown, 1984), model ing-scalTolding- 
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* fading (Collins, Brown, & Newman, 1989; Rogoff, 1990), and curriculum scripts (Pulnain, 1987). 
These pedagogical techniques fall somewhere between the extremes of complete studer* control (i.e.. 
active inquiry by the student) and complete tutor control (i.e_, a tutor lecture). However, the extent to 
which these pedagogical techniques have been used in naturalistic tutoring has yet to be documented. 
Given that the vast majority of tutors in school systems have received little or no tiaining in tutoring (Fitz- 
Gibbon, 1977), the sophisticated pedagogical techniques presumably ar infrequeni 

This ONR project investigated the dialogue patterns in naturalistic tutoring sessions. We analyzed tutorial 
dialogue as knuwledge was collaboratively constructed and modified. In addition to documenting some 
basic facLs about tutorial dialogue, we focused on four components in depth: 

1 . Question asking and answering . What mechanisms account tor the questions and answers of 
tutors and students? 

2. Feedback during the constnictlon of common ground. Does the student give accurate 
feedback to » ^e tutor on the student's understandmg of the material? Does the tutor give the 
student accurate feedback on the quality of the student's contributions? 

3. Dialogue patterns . What are the pervasive dialogue patterns during tutoring? In particular, we 
will concentrate on a 5-step dialogue frame. 

4. Pragmatic assumptions . What pragmatic assumptions are followed during tutoring? To what 
extent ai'e these assumptions the same as or different from the pragmatic assumptions in everyday 
conversation? 

These aspects of tutorial dialogue may or may not be compatible with the goals of good pedagogy. We 
will identify ways that tutors might strategically improve learning by changing the normal course of tutorial 
dialogue. 

We repoiled some analyses of tutoring sessions in previous reports ^Graesser, 1992, 1993; Graesser, 
Person, & Huber, 1992, 1993; Graesser & Person, in press; Person, Graesser, Magliano, & Kreuz, 
1 993). A final report on our previous ONR grant ("Questioning Mechanisms during Complex Learning", 
N00014-90'J- 1492, R&T 4422548) summarizes earlier analyses of the tutoring data. 



Naturalistic Tutoring Sessions: Two Corpora 

Research methods corpus 

Graduate students in the psychology deparunent at Memphis State University tutored undergraduate 
students on troublesome topics in a research methods course (offered by the psychology department). All 
25 students in the course were tutored as pan of a course requirement, so there was a full range of student 
achievement (i.e., not just underachieving students). The three tutors had received A's in a graduate- level 
research methods course. Therefore, the corpus involved "cross-age" tutoring, which is one of the 
common types of tutoring in school systems. The tutors had never tutored in the area of research methods 
before this study, but they had occasionally tutored on otlier topics. 

There were 44 one-hour tutoring sessions. The tutoring sessions were videotaped and transcribed. The 
room used for tutoring was equipped with a video camera, a tele^ ^ion set, a marker board, colored 
markers, and the textbook for the course. The camera was positioned so that the student and the entire 
marker board was in sight. Therefore, the transcripts of the tutoring sessions included both spoken 
utterances and messages on the marker board. The transcribers were insuaicted to U'anscnbe the entire 
tutoring sessions, including all "ums", "ahs", word fragments, bi-oken sentences, and pauses. Messages 
on the marker board were sketched in as much detail as possible. 

The sessions covered six troublesome topics in an undergraduate research methods course. The topics 
were operational definitions of variables, graphs, inferential statistics, the evolution of hypothesis to 
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design, factorial designs, md interactions. An index card was prepared for each topic; 3-5 subtopics were 
listed under each subtop'C. The tutor was asked to cover the topic and subtopics on the index card during 
the course of the tutoring session. The tutors were not given a specific format to follow, but they were 
lold to resist ihe temptation of simply lecturing to the student. The students were exposed to ihe material 
covered on a topic before ihey participated in a tutoring session. The topic was covered in a classroom 
lecture by the instructor before the tutoring session. In addition, both the student and the tutor were 
required to read specific pages in a research methods lexi before the tutoring session. 

Each of the 25 siudenLs panieipaied in two tutoring sessions, yielding 50 sessions altogether. Each 
student was randomly assigned to 2 of the tutors. Six of the 50 sessions could not be analyzed because 
the voices were not sufficiently audible on the videoupe. Thus, analyses were performed on 44 tutoring 
sessions. 

Examination scores and final grades were available for the 25 undergraduate students, so we could 
investigate the relationship between student achievement and tutoring processes. A total examination score 
was based on three objective examinations throughout the semester; there was a total of 150 four- 
alternative forced-choice questions. The 25 students had a mean score of 100,6 (SD - 1 1.4). Regarding 
the final grade received in the course, 4 studenis received an A, 9 received a B, 10 received a C, and 4 
received a C- or D. 

Algebra cr>rpus 

This corpus consisted of 22 tutoring sessions in which high school students tutored 7th graders on 
u-oublesome topics in algebra. There were 13 students who were having trouble with particular topics in 
their algebra course (according to their teachers). There were 10 tutors who normally provided the 
tutoring services for the middle school. On the average, a tutor had 9 hours of prior tutoring experience 
before tutoring a student in this sample. The corpus of tutoring sessions included almost all of the tutoring 
sessions that occurred in the middle school for 7th graders teaming algebra during a one month period. 
Unlike the research methods corpus, the tutoring sessions in this algebra corpus were remedial activities 
for underachieving students. Unfortunately, grades and test scores were not available for these students, 
so it was not possible to zj&s/tss the relationship between achievement and tutoring processes. 

Almost all of the tutoring sessions covered three tutoring topics that are frequently probrematic to ''th 
graders, 'ihese include (a) calculation of positive and negative numbers, (b) constructing equations from 
algebra word problems, and (c) fractions. An examination and chapter excerpt from a textbook were 
normally associated with each topic. The tutoring sessions lasted approximately 60 minutes, which was 
comparable to the research methods corpus. A research assistant from Memphis Stale University 
videotaped the sessions in a similar manner as the sessions were videotaped in the research methods 
corpus. 

Reliability of scoring tutoring transcripLs on content variahl^^s 

Previous reports and arucles have discusseci how the transcripts were analysed on content categories 
(Graesser, 1992; Graesser & Person, in press; Graesser, Person, & Huber, 1992, 1993). Therefore, 
these details will not be covered in this report. Trained research assistants were capable of reliably coding 
most of the data: segmenting transcripts into speech act units, assigning speech acts to speech act 
categories, identifying questions, assigning questions to question categories, identifying mechanisms that 
generate questions, and classifying tutor feedback. Whenever these categories were scored, two judges 
independently fumis*ied ihe judgments and achieved sufficient interjudge reliability (i.e., Crunbach's alpha 
= .70 or higher). 

The judges needed to have more expeiiise in the ease of some coding analyses. One such analysis 
consisted of the quality of a contribution in a tutoring session. There were four levels of answer quality: 
(1) error-ridden answer, (2) vague answer or no information, (3) partially correct answer, and (4) 
completely correct answer. The judges needed to hjivc a high amount of domain knowledge about 
research methods to make these judgments. Therdbre these judgments were made by professors, 
posldocs, or 4ih-year graduate siudenus in experimental psychology. Other analyses that required special 
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. experdse involved global levels of the tutorial dialogue (e.g., whether an excerpt involved the application 
of a curriculum script, error-remediation, or some o'i.er global process). In this case, the judges needed to 
have sophisticated knowledge about ihe tutoring process in addition to extensive domain knowledge. A 
pair of judges collaboratively supplied judgments in ihe case of dimensions or categories that required high 
expertise. 



<;i!idi^ni CnnLrihutions in Tulonal Dialoguo 

Tutorial dialogue is presumably guided or consu'ained by the knowledge deficits and misconceptions of a 
particular stuctenL To what extent does the siudent actively guide tutorial dialogue? Does the tutor 
accurately infer the level of knowledge and ihe misconceptions of the student? Is ihe student capable of 
detecting his or her own knowledge deficits and level of understanding? This section addresses the role of 
the student in tutorial dialogue. We pmscRt a number of claims, with empirical data backing each claim. 

Claim I: Stu dents rarely control tutorial dialogue. 

Students rarely initiate exchanges that exert control over the tutorial dialogue. In ihe research methods 
cotpus, only 5% of the subtopics were initiated by ihe student whereas 95^^ were initiated by the tutor. 
The coitesponding percentages in the algebra sample wen; \0% and 90%, respectively. When students 
did initiate a new subtopic, ihey normally brought up an example problem or concept ihat they were having 
difficulty with (e.g., "I had trouble with problem 4", "I don't underi>tand what an antagonistic interaction 
is"). Students never set the agenda for the tutoring session. In both tutoring corpora, the tutor carried ihe 
burden of setting the agenda, introducing subtopics, and proposing problems to solve. 

This result is incompatible with the active inquiry hypothesis that was briefly discussed earlier. That is, 
ihe advantage of tutoring over classroom settings cannot be attributed to the student taking active control of 
the learning experience. With rare exceptions, students were not inquisitive, active, self-regulators of their 
knowledge in these tutoring sessions. Tutors need to impose special strategies of transferring control to 
the student if there is a commitment to promote active learning. Such strategies were not in the repertoire 
of the normal tutor. 

There was one finding that indicated that students are somewhat more active in tutoring contexts than in 
classroom settings. Student questions were more frequent in the tutoring settings than in classroom 
settings (Graesser & Person, in press). The mean number of student questions per hour was 21. 1 (SD = 
13.0) in the research methods corpus and 32.2 (19.7) in the algebra corpus. In contrast, a particular 
student in a classroom setting asks only , 1 1 question per hour; an entire class of studenis asks only 3.0 
questions per hour (Dillon. 1988; Graesser & Person, in press). From the standpoint of a single student, 
student questions were approximately 250 times as frequent in tutoring sessions as in classrooms. In spite 
of the high incidence of student questions during tutoring, tutor questions were substantially more 
prevalent than student questions in tutoring sessions. We found that 80% of the questions in a session 
were asked by the tutor (82% in the research methods co.pus and 78% in the algebra corpus). This 
percentage is somewhat lower than the percentage of teacher questions in a classroom (969r). In 
summary, student questions are much more prevalent in tutoring sessions than in classrooms, but it is still 
the tutor who asks most of the questions and thereby governs the course of the session. 

Most of the questions that students asked during the tutoring session did not address their own knowledge 
deficits. Knowledge deficit questions occur under ihe following conditions: (a) when the student 
encounters an obstacle in a plan or problem, (b) when the student detects a contradiction, (c) when an 
unusual or anomalous event is detected, (d) when there is an obvious gap in the student's knowledge base, 
and (e) when the student needs to make a decision among a set of alternatives that aiv equally likely 
(Graesser & MeMahen. 1993; Graesser, Person, & Huber, 1992, 1993). Only 299( of the student 
questions were knowledge-deficit questions (Graesser & Person* in press), which amounts to 7.7 
questions per hour, Most of the student questions {549c ) were attempts to confirm the validity of their 
own beliefs (e.g., "Doesn't a factorial design have two independent variables'?") or to confirm common 
ground (e.g., "Are you talking about the second condition?*'). 
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Good students did nol ask more questions. Good students also did not lend lo ask more knowledge- 
deficil questions. The frequency of student questions was not robustly related to achievement in the 
research methods corpus. The correlations were low between examination scores and (a) the total number 
of student questions (r = -.22) and (b) the proportion of studer, questions that addressed knowledge 
deficits (r = . 15). The correlations were also low when final grade was the measure of achievement (i = 
-.34, p < .05 for total number of questions; r = .32 for proportion of questions that involved knowledge 
deficits). Other researchers have also failed to show a positive relationship between question asking and 
achievement (Fishbein, Eekan, Lauver, van Leeuwen, & Langmeyer* 1990). 

In summary, the available evidence suppoiis claim 1. Students rarely take an active role in governing the 
agenda in iJie tutoring session. They rarely expose their own knowledge deficits and actively seek 
remediation. Students ask far fewer questions than tutors and most of their questions do not address iheir 
knowledge deficits. It is not the case that the good students are more active and ask more questions. 
Students apparently need to be trained how to ask questions and to be active learners. It is the tutor who 
carries the burden of establishing the tutoring agenda, introducing topics, presenting examples to work on, 
and exposing the student's knowledge deficits. The active inquiry hypothesis does not explain why 
learning is better in one-to-one tutoring than classroom settings. 

Claim 2: Deep reasoning questions are prvvalL-nt in tutoring sessions . 

There is extensive evidence that comprehension improves if students are trained how to ask good 
questions and to seek answers to the questions (King, 1989, 1992; Rosenshine & Chapman, 1990; Singer 
& Donlan, 1982; Wong, 1985). However, the process of asking good questions does not come naturally 
to students, so they need to be trained in developing this cognitive skill (Pressley, 1990). Therefore, we 
invesugaied the quality of questions in the tutoring protocols. 

One index of question quality is /'hether the question exposes deep reasoning about the problems and 
domain topics. In logical reasoning, the statements expressed in an answer consist of the premises and 
conclusions of a logical syllogism. In causal reasoning, the answer conveys the antecedents and 
consequences of events. In goal-oriented reasoning, the answer traces the goals and planning of agents. 
It is well documented that comprehension and memory for technical material improves to the extent that the 
learner constructs explanations and justifications (Chi et al., 1989; Cobb et al, 1992; Pressley et al., 
1988), According to the explanatory reasoning hypothesis discussed earlier, tutoring facilitates learning 
because it exposes explanations and justifications. 

Graesscr's question taxonomy specifies those question categories that expose deep reasoning (Graesser & 
Person* in press; Graesser, Person, & Huber, 1992, 1993). They include the following six categories. 

1. An tgcedeni questions (why?, how?). What caused a state or event? What logically explains 
or justifies a proposition? 

2. Consequence questions (what if?, what next?). What are the causal consequences of a state or 
event? What are the logical consequences of a proposition? 

3. Goal orientation (why?)* What are the goals or motives behind an agent's action? 

4. Enablement (why?, bow?). What object or resource allows an agent to pcrfom an action 
What state or event allows another state or event to occur? 

5. I nstrumonuil/procodural? (how?). What instrument or plan allows an aeent to accomplish a 
goal? 

6. Expectational (why not?). Why did an expected stale or event not ^x*cur? Why didn't an 
agent something? 
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' These'quesiions are manifesied in a lutoring session lo ihe exienl thai ihe lulor and sludenl explore deeper 
levels of comprehension. Il should be noied lhai ihese deep reasoning questions were highly correlaied 
with the deeper levels of cognition in Bloom's uixonomy of educational objectives in ihe cognitive domain 
(Bloom. 1956), i = .64. q < .05. Low-level questions in Bloom's taxonomy inquire about specific facts, 
terminology, and explicit information in a text; deeper level questions involve reasoning, application, 
analysis, synthesis, and evaluation (see also Scardamalia & Bereiter. 1992). 

Our analysis of the research methods corpus and algebra corpus uncovered an impressive number of deep 
reasoning questions. The proportion of student questions that were deep reasoning questions was .22 in 
the research corpus and .39 in the algebra corpus; the corresponding proportions for tutor questions were 
.16 and .17, respectively. In a typical tutoring session, a student asked approximately 8 deep reasoning 
questions (per hour) whereas a tutor asked 19 questions. The incidence of deep reasoning questions was 
much higher in the lutoring sessions than in normal classroom settings, according to our best estimates 
from published studies on classroom questioning (Dillon. 1988: Graesser & Person, in press). The 
incidence of student questions in a classroom is extremely low in all published studies (.1 1 question per 
student per hour), so deep reasoning questions would also be low. Only 4% of the teacher questions in a 
classroom are deep questions in Bloom's taxonomy; the vast majority of teacher questions are short- 
answer questions that grill students on explicit material (Dillon, 1988; Kerry, 1987). Therefore, the 
explanatory reasoning hypothesis provides a ver>' plausible account of the Finding that learning is better in 
tutoring than in classroom settings. 

The good students asked a higher proportion of deep reasoning questions. There was a significant 
positive correlation between the proportion of student questions tiial were deep reasoning questions and (a) 
examination scores (i = .44. q < .05) and (b) final grades (r = .58, c < .05). Thereforex good students 
penetrated the deeper levels of comprehension. 

Although the incidence of deep reasoning questions is quite high in tutoring sessions, we believe that the 
quality of student questions and tutor questions could subsiantially improve. Most of the students' deep 
reasoning questions were in the instrumental/procedural category (.59 in the research methods corpus and 
.74 in the algebra corpus). This is the least sophisticated category of the deep reasoning questions. The 
student is merely requesting that the tutor describe how to compute a function or perform a procedure 
(e.g., "How do you solve this problem?"). The student might learn how to apply a formula or procedure 
mechanically* without any understanding of the reasons, justifications, and principles behind each step 
(Cobb el al., 1992; Greeno, 1982; Mayer, 1992; Ohlsson & Rees, 1991). Given that one of the 
contemporary missions of the National Council of Teachers of Mathematics ( 1989) is lo promote leaj'ning 
with understanding, one approach lo meeting this objective is to leach better question asking skills. 

We have developed computer software that requires sludenUs lo ask questions and that exposes them lo 
good ^questions. Our Toinl and Query" (P&Q) software forces sludenis lo learn entirely by asking 
questions and reading answers lo the questions (Graesser. Langslon, & Lang, 1992; Graesser, Langstun* 
& Baggeit, 1993). In order lo ask a question, the sludenl first points to a word or picture element on the 
computer screen and then lo a question that is relevant to the element (from a menu of relevant questions). 
The menu of relevant questions is formulated on the basis of background knowledge structures and a 
theory of human question answering called QUEST (Graesser & Franklin, 1990; Graesser, Gordon, & 
Brainerd, 1992; Graesser & Hemphill, 1991 ; Graesser, Lang, & Roberts, 1 991 ). The P&Q system is 
similar lo some other menu-based question asking systems lhal have been developed (Schank, Ferguson, 
Bimbaum. & Greising. 1991; Sebrcchis & Swariz. 1991). The incidence of sludenl quesliuns is quite 
high on the P&Q software. Whereas a sludenl asks . 1 question per hour in a classroom and 27 qucMions 
per hour in a tutoring session, the student asks 135 qiiestions per hour when using the P&Q si^fiware. 

The P&Q software is a promising environmenl for leaching question asking skills. The quality uf Mio 
students' questions should improve by exposing them to good questions on the question menu. After 
ex'^nsive experience with the P&Q software, sludenis would auiomau//: good question asking skills. This 
might have a radical impact on improving comprehensiori )ecause, as discussed earlier there is cxicn.sivc 
evidence lhal comprehension improves after .sludenis are trained how to ask good quesiliuns. 
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Claim 3: Smd eriLs reveal iheir knowledge in their answers to looic-rclcvanl uucstions . 

Ideally, the tutor should be able to adjust the level orinsiruciion and ^mediation to ihe idiosyncratic 
knowledge deficits and misconceptions of a particular student. This requires the tutor to have a valid way 
of assessing what the student understands. The developers of many intelligent tutoring systems, for 
example, have embraced <;tLideni modeling as an impuriant principle of FTS design (Anderson & Reiser, 
1985; Burton & Brown, 1982; Clancey, 19«3; Ohlsson, 1986; Van Uhn, 1990). Hence the question 
arises: How does the tutor accurately infer what tlie student knows? We performed some analyses on the 
research methods corpus in order to determine whether the students' achievement is rellecied in their 
questions and their answers to questicns. 

Table 1 presents correlations between student achievement and several measureji of student questions and 
answers. Consider first the measures that do not correlate with achievement. Tutors did not accurately 
infer student knowledge on the basis of the frequency of stuaent questions or the proporiiun of student 
questions that were knowledge-deficit questions. These correlations wen: either nonsignificant or 
marginally significant at a lax alpha-level. 

Tutors also could not accurately gauge student understanding by merely asking the students (e.g., "Do you 
understand?". "Do you follow*^". "Okay?"). When these comprehension-gaucin g questions arc asked, the 
student either answers YES ("I understand"), answers NO ("I don't understand"), or gives an indecisive 
response (no answer. "I don't know"). Are these answers a valid reflection of the student's true 
understanding? The data revealed that they are not accurate. There was a near zero correlation between 
student achievement and the likelihood of the students' answering YES. In fact, this relation v/as found to 
be significantly curvilinean .46, .62, .61, and .52 for students receiving final grades of A, B, C. and C- 
fDr respectively. This was the only significant curvilinear relationship in all of the correlational analyses 
involving the measures in Table 1. Regarding the NO answers, there was a significant positive correlation 
between exam scores and the likelihood of students' answering NO (I don't uiiderstand). This is a 
counterintuitive outcome: It was the good students who tended to say that they did not understand. Chi et 
al. (1989) also reported a positive correlation in the domain of physics between student understanding and 
the likelihood of students answering NO. Therefore, available evidence indicates that a tutor cannot 
simply ask students whether they understand and expect the students to supply accurate feedback* The 
feedback i"; misleading. Students are very poor at calibrating their own comprehension of material 
(Glenberg, Wilkinson, & Epstein, 1982; Weaver, 1990). 

According to Table 1. there was a robust correlation between achievement and the proportion of student 
questions that were deep reasoning questions. This correlation was discussed earlier. We suspeeu 
however, that it would be difficult for the tutor to gauge student understanding by this index. An average 
student asks only 8 deep reasoning questions per hour, so the tutor would be basing the computation on u 
low frequency event. Although good students had a higher proportion of deep reasoning questions lhan 
poor students, the absolute frequency of deep reasoning questions did not significantly vai7 with student 
achievement (because good students tended to ask fewer questions). It would indeed be a vei7 subtle 
cognitive computation for the tutor to estimate the proportion of student questions that are deep reasoning 
questions. We conclude that the occurrence of students' deep reasoning questions does not provide a 
reliable basis for inferring student knowledge. 

The students' answers to topic-related questions provided the most reliable basis for inferring student 
knowledge. There was a robust negative correlation between siudcnt achievement and the proportion of 
students* answer contributions that were in the categories oi uirur-riddcn, vague, or nu-answer. There 
was a positive coiTclation between achievement and student answers that were cumplelely cuiivct. It 
should be noted that tutors asked a large number of questions ( 104 quesUuns per hour), so there was 
ample opportunity for the students to give answers ajid t'or the lutor to evaluate the quality of the answers. 
Therefore, it is the tutor's burden to judiciously select questions that diagnose the siuclenl's knowledge 
deficits, bugs, and deep misconception.s. 
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Tutor rnntrihuLions in Tutorial Dialogue 

We have established thai the tutor plays the primary role in setting the agenda, introducing topics, selecting 
exemplar problems, and asking questions. In Tact, 90-95% of the new topics and problems were initiated 
by the tutor in the research methods corpus and the algebra eorpus. The tutor asked 78-82% of the 
questions. The tutor established the ground rules and formal in all ol' the tutoring sessions. This section 
identifies the pedagogieai strategies and dialogue pattems that were implemented by the tuior. 

Claim 4: Sophisiicated tutoring su-atetiioiLare rdi:^ . 

Tutors rarely implemented sophisticated tutoring su^aiegies, such as the Socratic method (Collins, 1985), 
inquiry teaching (Collins, 1988), ihe reciprocal training method {Palincsar& Brown, 1984), and 
modeling-scaffolding-fading (Collins etal., 1989; Rogoff, 1990). These methods were virtually 
nonexistent in the research methods coipus and the algebra corpus. It takes a large amount of u-aining and 
experience for tutors to use these sophisticated pedagogical strategies. It is therefore not surprising that the 
strategies were nonexistent in our sample of 13 tutors, and presumably are nonexistent in real school 
settings. There should be high payoffs in learning outcomes for those researchers and practitioners who 
introduce sophisticated tutoring strategies in research projects and in school curricula. 

Claim 5: Most of the mtors' activities and questions are generated by curriculum scripts . 

We analyzed a sample of tutor questions in order to determine what mechanisms generate tutor questions 
and what agenda is set by the tutor. We selected 249 questions from the research methods corpus and 93 
questions from the algebra corpus. Approximately half of the questions were deep reasoning questions (as 
defined earlier) and half were short-answer questions (e.g., concept completion, quantification, feature 
specification). For each of these questions, we identified one or two mechanisms that generated the 
question (see Table 2). We also specifiei how Lhe tutorial dialogue continued after the tutor question was 
answered (see Table 3). The latter analysis provide? a snapshot of the typical agenda set by the tutor or 
initiated by the student. 

The data in Tables 2 and 3 support the conclusion that the tutors' curriculum scripts generated most of the 
tutor questions, new subtopics, and tutoring activitie.. The curriculum script consists of a set of 
subtopics, examples, and questions that the tutor selects for the tutoring session (Putnam, 1987). In the 
case of the research methods corpus, the tutor selected the subtopics in a top-down fashion. The selected 
subtopics had a close correspondence to the information in the chapter excerpts and the index cards 
supplied by the experimenter (with the major topic and 3-5 subtopics). Virtually all of the examples 
selected by the tutor came directly from the book. Very often a tutor introduced the same example, 
subtopic. or question to several students thai were tutoivd on a particular topic. Most (6755:) of the 
questions were asked in the context of an example problem in the research methods course. Examples 
played an even more predominant role in the algebra corpus; 929^- of the tutor questions were asked in the 
context of a specific example. The tutor normally selected a problem from the student s examination or 
textbook. After the tutor selected the example problem, the tutor typically coached the student to a 
solution, or the tutor and student collaboratively solved the problem. It Si?ould be noted that the 
curriculum script is not necessarily a rigid sUTjcture in terms of the selection of material and the ordering of 
material. According to Mc Arthur et al. (1990), the tutor revises and replans the agenda throughout lhe 
course of the tutoring session. The i^evision and iieplanning arc no doubt influenced by the student's 
pcrfoi'mance. 

Claim f)'. Vurv few of the tutors' uiii'stions and activities are triggered by suidunt ciTOrs imd 
mij^conccptions. 

The rcsulls in Tables 2 and 3 support this claim. The tutor did not spend much time diagnosing, 
dissccdng, and troubleshooting the student errors that were manifested in the dialogue. According to 
diagnosis>rt!mediatinn models of intelligent tutoring (Anderson & Reiser, 1985; van Lehn, 1990), the tutor 
should spend time diagnosing itnd coiTecting the student's conceptual bugs and misconceptions. These 
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' bugs ind misconceptions are manifesied ' errors committed by ihe student. As will be reported lalcn 
the tutor does normally correct errors thu. .^itace. However, the tutor does iM spend much lime 
rectifying the buggy rules and deep misconceptions that explain the errors. It is very difficult for a tutor to 
identify the underlying bugs and misconceptions, let alone to repair them. Consequcndy, tutors do not 
normally invest the lime in such activities. 

Claim 7: A 5-step dialogue i^me i<; n pervasive dialogue paiiem. 

Ancxiremciy pervasive dialogue pattem consisicd of a 5-siep dialogue frame iliai was initiated by a tutor 
question. 

Step 1 : Tutor asks question 

Step 2; Student answers question 

Step 3; Tutor gives short feedback on the answer 

Step 4: Tutor improves the quality of the answer by directly supplying information or by initiating 
a collaborative excnange 

Step 5: Tutor assesses the student's understanding of the answer 

Figure 1 specifies furilier the components of this dialogue frame. An example of this frame is provided 
bclow'; 

1 . TUTOR: Now what is a factorial design? 

2 . STUDENT: The design has two variables. 

3. TUTOR: Uh-huh. 

4. TUTOR: So there are two or more independent variables and one dependent variable. 

5. TUTOR: Do you see that? 
STUDENT: Uh-hub- 

In step 1, ihe tutor normally asks a single question. Sometimes the question is not posed clearly or as 
intended, so the tutor revises the question. Successive tutor questions drift systematically in a manner that 
makes it easier for the student to answer the question (Graesser, 1992). For example, in the excerpt 
below, an answer to the first question would involve an elaborate construction of information, whereas a 
simple YES or NO would be an adequate answer to the second question. 

TUTOR: So how could we do that [operationally define intelligence)? I mean, do you think that 
everyone agrees on what intelligence is? 

In the following example, the tutor restates the question in different words that provide a more succinct 
focus on the intended question. It illustrates that the process of eonsUiicting a question i.s iterauvely 
distributed over time. 

TUTOR: Did you see how th&y did that? How did they manage to do that? What did they do 
there? 

Sometimes the student does not understand the question, particulariy when the question is not adequately 
specified. The student asks a counter-clarification question to gain clarity on what the question is. The 
tutor answers the embedded counter-clarification queslion and then the student answers the original 
question. This is illustrated in the exceipt below. 

Tb i"OR: Why would a reseaicher even want to use more than two levels of an indcpciidcnt 
variable in an experiment? 
STUDENT: More than two levels? 
TUTOR: Uhhuh. 

STUDENT: They would, um, it'd be real accurate "cause it would show if there's a cuivilinear. 

In step 2, the student produces an answer to the question. The process of the student consu'ucting an 
answer is ileralively constructed over lime, as the above example illustrates. Answers are not immediately 
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articulated in a clear, succinct, coherent form. The student frequently produces single words or incoherent 
fragments of infomnaiion. The tutor ends up working with these fragments (in step 4) in a fashion that, 
allows a reasonable answer to evolve. When a student's initial answer is incomplete* the tutor frequently 
pumps the student for additional information by expressing neutral feedback in step 3 (e.g., "uh huh"). 
There is an iteration of steps 2 and 3 when the tutor pumps the students for more answer information. 

In step 3. the tutor gives short feedback on the student's answer. The feedback is positive, negative, or 
neutral. Most of the time the feedback is expressed verbally. Occasionally the tutor nods or shakes his 
head to expres, feedback. When the feedback is ncuyal on the written transcript, it is necessai*y to view 
the videotape and code the intonation of the utterance in order to accurately classify the feedback as 
positive, negative, versus neutral (Fox, 1992), We have found that 34% of the neutral observations on the 
written transcripts ended up being either positive or negative when the videotape was viewed. Tutors 
rarely used lengthy pauses or hesitations to signify negative feedback. The likelihood of the tutor pausing 
or hesitating in step 3 did not vary as a function of the quality of the student's answer in step 2; the mean 
likelihoods were .08, .13, .15, and .13 when Ihe students' answers were error-ridden, vague, partially 
correct, and completely con*ect, respectively. 

In step 4, the lulor initiates a variety of methods to improve the quality of the answer (see Figure 1). 
Sometimes the tutor directly splices in the correct answer. More frequently, the tutor uses scaffolding 
techniques that encourage the student to supply information in a collaborative fasltion. For example, the 
tutor might provide a hint or ask an embedded question, as illustrated below. 

[The tutor and student are discussing how to operationally define the quality of a restaurani] 

TUTOR: What type of scale would that be? 
STUDENT: Oh, let me think, which one. I don't know. 
TUTOR:' Try to think. Nominal or (pause)? 
STUDENT: Ordinal, yeah. 

TUTOR: It would be. Why would it be an ordinal scale? 

Therefore, the construction of an answer is a collaborative activity - not a burden that rests entirely on the 
shoulders of the student. On the average, the tutor ends up supplying more answer information than does 
the student, even though the tutor originally asks the question (Graesser, 1992). 

In step 5, the tutor assesses whether the student understands the answer. In most cases, the tutor simply 
asks the student whether the student t^nderslands ("Do you understand?", "Do you follow?", "Okay?"), 
Unfortunately, student answers to these comprehension-gauging are inaccurate, as was discussed in the 
context of claim 3. Tutors occasionally ask a simple follow-up question that tests the student's 
understanding of the answer {19( of the cases). Very rarely does the tutor thoroughly test the student's 
understanding by asking a complex question or by requiring the student to solve a problem, as illustrated 
below. 

TUTOR: Do you have any problem with these kinds of word problems (refeniiig to a section in 
the book). Where they say-- 
STUDENT: (interrupts) Uh, not really. 

TUTOR: You don't? You don't? You don't have any trouble with that? 
STUDENT: No. 

TUTOR: Let's jtist do one of them. Um, Dan earned 56 dollars, which wus twice more than 
what Jim earns* Now you're supposed to write an equation* 
STUDENT: Uh, I can't write the equations. 

Teachers in classrooms nonnally enact a 3-^ilep dialogue frame instead of a 5-step dialogue frame, Mehan 
( 1979) identified a persistent dialogue pattern in classrooms which includes elicitation, response, and 
evaluation. The teacher elicits information from the student, the student responds, and then the teacher 
evaluates the response. This classix^om dialogue pattern corresponds to the first three steps of our 5-step 
dialogue frame in tutoring. What makes tutoring special is the prevalence of the two extra steps (4 and 5), 
It is conceivable that these extra two sleps account for the advantage of tutoring over classroom settings. 
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Claim 8: Question answeri ng Is a cnllahoraiive exchange. 

Research in conversation has emphasized Ihe point that conversation is a collaborative activity (Clark & 
Schaefer, 1989; Kreuz & Roberts, 1993). The listener assists the speaker by Tilling in words and by 
providing backchannel feedback that acknowledges that the listener is following what the speaker is saying 
f uh hah"). The listener does this while the speaker is speaking. 

Not surprisingly, question answering is a collaborative activity in tutorial dialogue. This claim is 
supported in a simple analysis of the number of turns in the answers of tutor questions. There would be 
only two turns if the student answered the question (step 2) and the tutor supplied feedback (step 3). 
Mehan's (1979) elicitation-response-evaluation sequence requires a minimum of two lums. In fact, 
however, there are many more turns when tutors pose questions in a naturalistic tutoring environment. 
The median number of turns was 5 in the research methods corpus and 10 in the algebra corpus. The tutor 
and student collaborate in ihe construction of answers to questions. 

Claim 9: Tutors need to pose questions with higher specification . 

Tutors eUiptically deleted words, phrases, and clauses from their questions under the assumption that the 
context is sufficiently rich for the student to Deconstruct the intended question. Unlbnunaiely, tutors are 
frequently incorrect in making this assumption. As a consequence, the student ends up misinterpreting the 
question or answering the wrong question. Tutor questions were classified on degree of specification, 
with values of high, medium, and low (Graesser & Person, in press). Only 2% of the questions had high 
specification and 50% had low specification. Students sometimes did not have enough context to interpret 
the question so they asked counter-clarification questions (see step 1 in Figure 1). The likelihood of a 
student asking a counter-clarification question decreased as a function of higher question specification, 
.17, .08, and .00 for tutor questions that were low, medium, versus high in specification. Therefore, 
tutors should make every effort to formulate their questions with a higher degree of specification. 

Claim 10: Tutors need to ask more long-answer questions . 

Tutors need to ask better questions in step 1 of the .5-siep dialogue frame. More specifically, questions 
could be posed in a manner that exposes more reasoning on the part of the student, such as the deep 
reasoning questions. Graesser and Person (in press) repoiied that there was a tendency for tutors to ask 
simple short-answer questions that required minimal contributions from the student (e.g., a single word, a 
YES/NO decision). Tutors need to be trained on question asking skills thai encourage the student to 
become a more substantial contributor 

Claim 1 1 : Tutors need to wait longer for stLiden t answers. 

Tutors could be more patient in allowing the student to supply an answer in step 2 of the 5-siep dialogue 
frame. Students need time to think, reason, and plan an answer (Dillon, 1988), The knowledge is 
normally fragile so it takes considerable lime to consu-uci an answer. Tutors do frequently pump the 
student for additional answer information in step 2, as mentioned earlier. However, the tutors could 
increase the pause duration in step 2 so the student has ample lime to think and reason. In a classroom 
study reported by Swift, Gooding, and Swift (1988), learning improved when teachers increased the 
pause duration. 

Claim 12: The tutor's recdback on studL^nt answers needs to bo more discri minuting . 

A good tutor presumably adjusts the feedback in step 3 to the quality of the student's answer in step 2. 
We performed some analyses that tested this intuitively plausible claim. We segregated student answer 
contributions into four quality levels: error-ridden, vague (or no answer), partially correct, and completely 
correct. Short feedback consisted of the brief positive, negative, or neutral responses in step 3 ('J.g., 
"yeah", "right", "good", "okay*', "uh huh", "not so", head movemenl). Long feedback consisted of 
lengthier comments on answer quality during step 4 (e.g„ "that is correct because..,", "there is a problem 
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wilh you:' claim ihai../'), Coircciive feedback is a more complex form of negative feedback; ihe lutor 
produces information in step 4 thai corrects erroneous or misleading information in a student's 
conthbution. 

Table 4 presents our analyses of tutor feedback as a function o^" the quality of the students' contributions. 
Most of the feedback was provided in the shon form (step 3). The long feedback provided a very small 
increment of evaluative information. Corrective feedback was particularly important in the case of error- 
ridden ansvt'ers. We performed statistical analyses on the data by treating each of the 13 tutors from the 
two corpora as a case. We collapsed the ciTor-ridden and vague answers in order to obtain a sufficient 
number of observations for each tutor. The likelihood of a tutor giving positive feedback (long or shon) 
increased as a function of answer quality, E(2,24) = 30.27, p < .05. There were significant differences 
among all three levels of answer quality (error-ridden/vague, partially correct, versus completely correct). 
The likelihood of a tutor giving negative feedback significantly decreased as a function of answer quality, 
E(2,24) = 24,38, p < .05. Once again, there were significant differences among all pairs of means. These 
findings indicate that tutors give discriminating feedback to the students. 

On the other hand, the tutors weix: noi perfectly discriminating when they administered positive and 
negative feedback. When error-ridden answers wei*e produced by students, the tutors gave positive and 
negative feedback with an equal likelihood, E(1.12) .01, When the students produced vague answers, 
the tutors were more likely to give positive feedback than negative feedback. Clearly, the feedback is off 
the mark in these cases. Part of the reason for this misleading feedback is that tutors are reluctant to give 
negative feedback. Perhaps the tutors believe that negative feedback wiU traumatize the student or reduce 
the willingness of student to supply information. Alternatively, perhaps tutors are following the politeness 
conventions of normal conversation (Brown & Levinson, 1987). 

Tutors frequently "spliced in" correct information when a student produced error-ridden answers. Yet the 
tutors did not normally acknowledge the error as an error, or pursue the implications of an error-ridden 
statement (see also Mc Arthur et al., 1990). There was a significantly higher likelihood of giving corrective 
feedback than short negative feedback or long negative feedback, £(2,24) = 35.87, p < ,05. It is quite 
possible that students were unaware that their contribuiions were error-ridden. Table 5 summarizes how 
the tutors responded to the errors of the studei.'s. 

Claim 13: Tutors improve aDSwer,qualitv with_a varioty of scaffolding strategies . 

Step 4 in Figure 1 lists many of the strategies that the tutor uses to improve the quality of the answer. 
Sometimes the tutor directly splices in the correct answer. Alternatively, the tutor encourages the student 
to collaborate by asking follow-up questions, giving hints, offering suggestions, and so on. Step 4 is the 
critical locus of applying scaffolding techniques. 

We performed some analyses that traced the evolution of an answer to each question. We observed the 
quality of conuibution N+1, given that the tutor and student had together achieved a particular level of 
quality via conmbulions 1 to N. Once again, there were four levels of answer quality: error-ridden, 
vague/noihing, partially correct, and completely correct. A u-ansition matrix was prepared for the tutor; 
this specified the likelihood that a tutor supplied a conuibution of quality Q at N+1, given that the student 
and tutor had achieved a cumulative state of quality C at contribution N. A similar transition matrix was 
prepared for the student. This analysis permitted us to quantify the quality of the information that was 
supplied by each speech participant. 

Table 6 presents the transition matrices for the tutors and students in the two corpora. The data can be 
interpreted from many perspeciives. We were intrigued by three patterns. 

A. Thv; tutor waited for the siudLMit to supply inlbrmation when ihe cumuiaii ve quality of the answer w;ij> 
vayue or noLhing. This generalization can be captured by the following production rule: 

IF Iqualily of cumulative collahorative exchange = vague or no answer) 
THEN (tutor pumps student formoi\i informationj 
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'The tutors were reluctant to give a completely correct answer when the cumulative quality was vague or no 
answer; the likelihoods were .12 and .03 in the research methods corpus and the algebra corpus, 
respectively. The comparable likelihoods for studcnus were significantly higher (.37 and . 14), Therefore, 
it was the student, not the tutor, that supplied coiTcct information in this situation, even though the tutor 
was more knowledgeable. Tutors normally pumped thesiudent with neutral feedback at step 3 (e.g., "uh 
huh") in order to encourage the student to supply more information (particularly at the beginning portion of 
an answer). Tutors were reluci?nt to rush in with a complete answer at the beginning of the answer 
evolution. 

B. The tutor spliced in a partially correct or compleu^ly correct answer when the student committed an 
error This generalization is captured by the following production rule: 

IF [student's contribution is error-ridden] 

THEN Ituior splices in an answer that is partially or completely correct] 

The likelihood of a tutor giving a partially orcompleiely correct answer on conuibution N+1 significantly 
vatied as a function of the cumulative quality state at contribution N, E(3,36) = 8.43, c < .05 (when 
combining the 13 tutors from the two corpora). The likelihoods were .59, .62, .58, and .81 for the 
quality states of completely correct, partially correct, vague/no-answer, and error-ridden atconuibution N. 
The .81 value was significantly higher than the other values. Therefore, tutors had the tendency to splice 
in a good answer when students committed errors. They frequently did this without informing the student 
that the student's answer was error-ridden (see claim 12). 

C. ITie tutor carried the burden of summarizing or i-ecapping the answer . The production inlc for this 
generalization is; 

IF [quality of the cumulative collaborative exchange = completely coirect] 
THEN [tutor supplies a summary or recap of the answer] 

Tutors were more likely than students to give a completely correct answer when the cumulative exchange 
had already reached the quality state of completely comect. .16 versus .04, respectively, E(l,12) = 6.08, n 
< .05. It would be preferable for the student to take on the burden of providing these summaries and 
recaps because such activities improve organization and retention. Tutors perhaps need'to be trained to 
shift this burden onto the student. 

There are a large number of sophisticated scaffolding techniques that could be applied in step 4 of the 5- 
step dialogue frame. Tutors would need to be u-ained to use these techniques effectively. For example, 
the modeUng-scaffolding-fading technique could be delivered more completely and skillfully. Tutors need 
to learn how to fade and let the student take more control when they are starting to achieve some success. 
We were struck by the fragmentary and poorly articulated conu'ibutions of the student. As a consequence, 
the tutors supplied most of the information, leaving the students to fill in short contributions (e.g., a single 
word, phrase, proposition, step, number). The tutors could relinquish conu'ol of the conversation much 
sooner and could gradually encourage students to supply longer coiuribuUons. 

Claim 14: Tutors do n ot adequately assess whc Mher the student understands the answer . 

The tutor assesses whether the student understands the answer in step 5 of the 5-step dialogue frame. In 
929( of the observations, the tutor conducted this assessment by simply asking tht student a 
comprehension-gauging question (e.g., "Do you understand?", "Do you follow?", "Okay?"). 
Unfortunately, the students* answers to these comprehension-gauging questions were notoriously 
univliablc, if not misleading (see claim 3 and Tabic 1 ). Tutors apparenily assume that students understand 
anything that gets discussed during tutoring. If s^^meihing gels said, tutors assume that it must be 
understood; the tutors merely seek a quick verification from the student that this is the case. 

A good tutor would assess the student's Uiiderstanding more rigorously. The tutor could ask one or more 
follow-up questions that are diagnosiically cliscriminaling and thai ux)ubleshoot potential 
misunderstandings. The tutor could present a similar problem and request that the student .solve it in order 

17 
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10 actively demonslrate understanding. However, ihe 13 luiors in our naluralislic sample were rarely 
rigorous in siep 5, 

Claim 15: Tutors need lo violate some pragmatic rules of polite conversation . 

The pragmatic mles of normal polite conversation have been identified by Gnce (1975) and others (Brown 
& Levinson, 1987). These mles are per/asive and highly automatized. Unfortunately, they sometimes 
present a bairier to effective pedagogy. A good tutor may need to violate some rules and conversational 
maxims in order to crack the barrier. For example, rather than following the Gricean "maxim of quantity." 
tutors need to be redundant and n:petitious lo enhance student understanding. Instead of being polite and 
"face saving* when a student makes an euor, the tutor needs to ''take off the gloves" and directly confront 
the student. 

The mles followed by participanl^ in nomial conversations have been described by Grice (1975). 
Discourse is governed by one overarching cooperative principle : conversational participants make a good 
faith effort to contribute and to collaborate in the ongoing discourse. Cooperation is augmented by four 
conversational maxims: quantity (don't say more or less than is required), quality (don't say things that are 
unurue or that lack evidence), relevance (don't say things that are exuraneous), and manner (don't say 
things that are vague or disordered). 

Brown and Levinson (1987) studied linguistic politeness in several cultures. They proposed some general 
principles and discourse su-ategies to taciliiaie social interaction. Central to their analysis is the notion of a 
face , or one's self image. Individuals in a culture attempt to maintain a positive self-image, and help 
others to maintain their self-images. Tliis is not always possible, however, because face is frequently 
endangered by face threatening acts , such as requests, criticisms, and demands. Each culture has a 
number of linguistic strategies to mitigate the impact of these face- threatening acts. 

Table 7 presents some of the maxims of Grice and politeness strategies of Brown and Levinson. 
Associated with each of these are costs and benefits from the perspective of effective pedagogy during 
tutoring. It is appropriate to follow the maxims and politeness strategies under some conditions, but to 
violate them under other conditions. 

The following example illustrates that there are potential pedagogical costs to the politeness strategy of 
"avoiding uisagrcemenL" The tutor and student were discussing various types of graphs. 

TUTOR: ...and that's our frequency disu-ibution... What is that one called again (pointing to a 
bar graph)? 

STUDENT: A histogram. 
TUTOR: Alright^or abar graph. 
STUDENT: Bar graph. 

The student failed to acknowledge the important distinction between histograms (involving continuous 
variables) and bar graphs (involving discrete variables). However, the tutor did not acknowledge that the 
student had made an error; in fact, the tutor gave potentially positive feedback in step 3 ("alright"). The 
tutor was Sufficiently ambiguous in step 4 to permit the eiioneous interpretation that a histogram and a bar 
graph are interchangeable. 

Once again, a good tutor may need to breach the normal conversational maxims and politeness strategies. 
This could be very uncomfortable to the student, of course. A possible solution to this problem would be 
to establish some "conversational ground rules" at the beginning of a tutoring session. The tutor could 
explain to the student that it i.s important for tlie tutor to provide critical feedback, to point out 
misconceptions, and to challenge the student. The tutor could encourage the student to articulate answers 
in detail and not to get rattled when negative leedback is given. The tutor could resurrect the adage that 
students learn from their errors. It is a question for further research whether these conversational ground 
rules will minimize face- threatening acts during tutoring, and whether systematic violations of maxims will 
facilitate learning. 
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rnmpumtional Models of Spt-i-oh Ac:t Pred iction: OunnLillcaiion of DiulopuL* Patterns 

Researchers in discourse processing, sociology, and sociolinguisiics have analyzed prominent dialogue 
patterns (Ciark & Schaefer, 1989; D'Andrade & Wish, 1985; GotTman, 1974; Gracsser, 1992; Mehan, 
1979; Schegloff & Sacks, 1 973; Turner & Cullingford, 1989), Some ot* the systemalicity resides at a 
categorical level that does not consider the world knowledge, beliefs, and goals of the speech participants. 
That is, there are appropriate orderings of speech act categories and inappropriate orderings, Schegloff 
and Sacks (1973) analyzed ihe adjacency pairs of conversational turns: Given that one spe;iker utters a 
speech act in category C during turn N, what is the appropriate speech act category for the other speaker at 
the next, adjacent turn N+1? The most common adjacency pair is the (Question -> Reply-to-question] 
sequence. The adjacency pair analysis considers only one speech act of prior context when generating 
predictions for the subsequent speech act 

Researchers have idcniified larger sequences of dialogue patterns. Mehan (1979) identified a frequent 
triple in classroom environments, as illustrated below. 

TEACHER QUESTION: What is the capital of Florida? 
STUDENT ANSWER: Athens, 

TEACHER EVALUATION OF ANSWER: No, that's not right. 

As discussed in the previous section, this u-iplet is expanded to a 5-step dialogue frame in tutoring 
environments. Counter-clarification quesdons produce a quadruple sequence, as illustra\ed below, 

QUESTION- A: Where did you go yesterday? 
QUESTION-B: Yesterday morning? 
ANSWER-B: Yeah, in the morning, 
ANSWER-A: To Jack's, for breakfast. 

The knowledge accumulated in the study of dialogue patterns has been fragmented and largely untested. 
No one has developed a model that ties together the assorted observations. No one has quantified how 
successfully these patterns account for the speech acts in naturalistic conversation. There is no model that 
is sufficiently broad in scope that it could be applied to any conversation or text. In view of these 
shortcomings, we developed some computational models that attempt to capture the systematicity in speech 
act sequences (Graesser, Swamer, Baggett, & Sell, in press; Swamer, Graesser, Franklin, Sell, Cohen, & 
Baggett, 1993). Two classes of the models have radically different computational architectures: a 
connectionist architecture and a symbolic architecture. 

The computational models assume that the sm^am of conversation (or text) can be segmented into a linear 
sequence of speech act categories. There have been extensive debates over what speech act categories are 
needed for a satisfactory analysis of human conversation (see D'Andrade & Wish, 1985), Wc adopted a 
slightly modified version of D'Andrade and Wish's (1985) set of speech ael categories. Their categories 
were both theoretically motivated and empirically adequate in the sense that trained judges could agree on 
the assignment of categories, Table 8 presents the 8 speech act categories that were adopted in our 
analyses. Given that there are two speakers in a dialogue, each speech act in a conversation can be in one 
of 16 categories (2 speakers x 8 basic speech acts =16), A Juncture (J) category was also included in 
order to signify lengthy pauses in a conversation and excerpts that are uninicipretable to judges. This 
yielded 17 categories altogether. In summary, the stream of dyadic conversation was segmented into a 
sequence of speech acts and each speech act was assigned to one of 17 speech act categories. 

Conversations analyzed 

Children's' dyads . SelK Cohen, Grain, Duncan. MacDonald, and Ray (1991) adopted ihis H-catogoiy 
speech act scheme in their analysis of 90 conversations involving pairs of children. Dyads of second 
graders and sixth graders were videotaped for 10 minutes in three different contexts: playing 20 questions, 
solving of a puzzle, and free play. The dyads were further segregated according to how well they knew 
each other: mutual friends (A and B like each other), unilateral friends (A likes B» but B neither likes nor 
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dislikes A), and acquaintances (A and B do not like each other or dislike each other). All of the children in 
the dyads were from the same classroom so they were never strangers. Sell et al. (1991 ) reported that the 
17-category speech act scheme could be successfully applied to the 16,657 speech acts in this corpus. 
Trained judges could segment the stream of conversation into speech acts with high reliability. The 17 
categories were sufficiently complcvc in the sense that all of the speech acts fit into one of the 17 
categories. Trained judges also could reliably categorize the speech acts; the Cohen's kappas were .82, 
.76, and .74 for the question task, tlie puzzle task, and the free play task, respectively. There was a mean 
of 2.3 speech acts per conversational turn. 

College tutoring . A subset of the research methods tutoring corpus was extracted and analyzed. We 
extracted all deep reasoning questions posed by the tutor (i.e., why, how, what-if, as discu.sscd earlier). 
The question and answer sequence for each of these questions was included in the college tutoring corpus. 
There were 2013 speech acts in this corpus, and a mean of 2.9 speech acts per conversational turn. 

Telephone conversations . We had access to a corpus of telephone conversations recorded by the Nynex 
corporation. The conversations were between telephone operators and customers in New York City. 
There were 1 102 speech acts in this corpus, and 2.5 speech acts per turn. 

Goodness-of-prediction (GOP) scor e 

The goal of each model was to capture the systematicity in the sequential ordering of the speech act 
categories. That is, to what extent can the category of speech act N+ 1 be successfully predicted, given the 
sequence cA speech acts 1 through N? A hit ra ^ ; is the likelihood that a theoretically predicted category 
actually occurs in the data, as specified in formula L 

p(hit) = pCcategory C occurred at N+ 1 I category C is predicted by the model at N+ 1 ) (1) 

A hit rate is not a satisfactory index of the success of a model, however, because there is no consideration 
of the likelihood that a speech act would occur by chance. For example, if a panicular speech act category 
occurred in the corpus 90% of the time, then there would be a high hit rate, assuming that the n^odel 
predicted thai category most of the time. A satisfactory index of the model's success would need to 
control for the baseraie likelihood that the predicted speech act occurred in the empirical distribution of 
speech act categories (called the a posterion distribution). For example, the ba5erate likelihoods of the 
speech act categories in the Sell corpus were .21. .14, .04, .02, .40, .03, .07. .03, and .07 for categories 
Q. RQ, D, ID, A, E, R. N. and J, respectively. We computed a goodness-of-prediction (GOP) score that 
corrected for the baseraie likelihood that a speech act category would occur by chance, as specified in 
formula 2. 

GOP score = lhit-rate(categoiy C) - bascrate(C)]/[1.0 - baseraic(C)] (2) 

Sometimes a model specified that more than one speech act category could occur at observation N+l. In 
this case, formulas 1 and 2 are still correct except thai the values are based on a set of categories rather than 
a single category. 

Compuiaiional mo dels 

Recurrent conneciionist network . Researchers in the connectionist camp of cognitive architectures have 
developed a recurrent network that is suitahle for capturing the sysicmaticity in the temporal ordering of 
events (Cleeremans & McClelland, 1991; Elman. 1990). The recurrent connectionist network preserves 
an encoding of all previous inpuu and uses this information to induce the stioicture underlying temporal 
sequences. 

There are four layers of nodes in the recurrent network, as shown in Figure 2. The input layer specifies 
the category of speech act N. There are 17 nodes in the input layer, one for each speech act category. The 
appropriate node is activated when speech act N is received. For example, if person 1 asked a question, 
then the Q I node would be activated in the input layer of the network. The output layer contains the 
network's predictions for speech act N+I. There are 17 output nodes, one for each speech act category. 
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An ouipul node has an activation value that reflects the degree to which the network predicts that output 
node. If the input were QU lor example, then we would expect RQ2 to receive a high activation value in 
the output layer. This would capture the regularity that people are expected to answer questions that others 
ask. The hidden layer captures higher order constituents that are activated by speech act N. Hidden layers 
are frequently implemented in connectionist architectures in order to capture iniemal cognitive mechanisms 
(Rumelhart & McClelland. 1986). The hidden layer h needed when direct input-output mappings fail lo 
capture systematicity in the data. There were 10 nodes ^»■^ the hidden layer of our network The context 
layer allows the network lo induce temporal sequences. The coniexi layer stores the activations from the 
hidden layer of the previous step in the speech act sequence (as designated by the tixed weights of 1 in 
Figuie 2). The activations of tlie hidden layer at step N depend on: (a) the input at N and (b) the activation 
of the context layer at N (which was the hidden layer at N- 1 ). Therefore, the hidden layer is receiving 
information about the present input and past inputs. The resulting activation pattern of the hidden layer's 
10 nodes at step N is subsequently copied into the context layer at step N+1. The context layer must have 
the same number of nodes as the hidden layer, namely 10 nodes in our model. 

There are a total of 440 connections that are allowed to vary in the weight space of this model. There are 
170 connections between the input layer and the hidden ■ "ver, given that there are 17 input nodes and 10 
hidden layer nodes. Similarly, there are 170 connections from the hidden layer to the output layer. The 
other 100 nodes link the 10-node context layer to the 10-node hidden layer. There ^re also connections 
from the hidden layer to the context layer that are fixed at 1.0. In preliminary Simula ions, we varied the 
number of nodes in the hidden layer and the context layer (from 6 to 14 nodes). However, the success of 
the model did not significantly depend on the number of nodes in these layers, at least within the range of 
6 to 14 nodes. 

The performance of the recum?.nt network was evaluated by computing two different GOP scones (see 
formula 2). A maximal activadon GOP score considered only one output node as the predicted speech act 
category for step N+1 . The predicted category was the one that had received the highest activation value in 
the output layer. An above-threshold GOP score allowed for the network to accommodate multiple speech 
act categories at each step. All output nodes that met or exceeded a threshold activation levei were 
predictions for step N+1. Preliminary tests had revealed that a threshold of ,18 provided an appropriate fit 
to the three corpora. On the average, 1.7 speech acts were above threshold at any given step in the 
conversation. 

We tested some connectionist models that removed one or more components of the recurrent connectionist 
model. This permitted us to assess which components of the recurrent connectionist model had the most 
robust impact on the prediction of speech act systematicity. 

Double-entry backpropapation network . This network considered only two speech acts of context (N- 1 
and N) when predicting speech act N+1. This was accomplished by removing the context layer of the 
recurrent network (see Figure 2) and adding 17 nodes for N-1 as additional nodes in the input layer 
(yielding 34 input nodes). The hidden layer was preserved. There wei-e 510 connections in the weight 
space for this network. 

Single-entry hackp ro pacation networfi; . This network considered only one speech act of context (N) when 
predicting speech act N+1, This was accomplished by removing the context layer of the recurrent 
network, but preserving the hidden layer. There were 340 connections in the weight space, 

Pcrceptron . This network removed both the hidden layer and the context layer of the recurrent network. 
Thus, iheie were direct connections between the input layer and the output layer. There were 289 
connections in the weight space (17 x 17 = 289), 

Recursive transition network (RTffl . This model had a symbolic computational architecture (Graesser* 
Swamer. Baggetl. & SelL in press; Stevens & Rumelhart, 1975). One advantage of a symbolic 
architecture is that the investigator can trace and anieulaic the dialogue patterns that explain systematicity in 
the data. In contrast, it is difficult to identify patterns in a weight space from a connectionist model and to 
articulate the patterns succinctly. 
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Figure 3 shows a recursive transition network (RTN) for speech act prediction that was developed by 
Graesser, Swamer, Baggelt, and Sell (in press). Some modules in the RTN would be anticipated on the 
basis of common sense and theoretical developmcnis in the literature. Following Clark and Schaefer 
(1989), for cxannple, the RTN i. Figure 3 segregates a Contribution from an Acknowledgment of tlie 
contribution by the other party. There are four modules thai emanate from the Contribute node 
(Interrogate, Inform, Direct, and Evaluate), which capture four basic goals of communication. Counter- 
clarification questions (i.e., k-Intenrogaie) are embedded in the second step of the Interrogate, Direct and 
Evaluation modules. The Challenge module is a i^eaction of person A when person B tries to evaluate 
something or B tries to get A to do something (i.e., the Direct and Evaluate modules, respectively). 

The RTN in Figure 3 has seven modules, altogether. Each module has two or three state nodes and a set 
of arcs that emanate ''rem each state node. The arc specifies the set of legal speech act categories and set of 
recursively embedded modules that are legal at that point The speech act categories are the s?nie 8 
categories that were defined earlier: Q, RQ, D, ID, A, E, V, and N. There are 7 recursively embedded 
modules: Contribute, Acknowledge, Interrogate, Direct, Evaluate, Inform, and Challenge. The i and k are 
indices that keep u-ack of which of the two individuals is speaking. In some cases, the same individual 
produces a sequence of speech acts. In other cases, the turn transfers to the other person, 

The RTN generates a set of legal speech acts at each step of the conversation. A speech act at N+I is legal 
il' there is at least one path in the family of alternative paths that emanate from speech act N. A hit occurs 
when speech act N+1 matches one of the legal alternatives. Hit rates and GOP scores can be computed in 
the same way that they were computed for the recurrent connectionist network (see formulas 1 and 2). In 
a discrete RTN, there is an all-or-none prediction for each speech act at step N+1 . In a weighted RTN, 
each arc is weighted according to the likelihood that the arc would be traversed while accounting for the 
speech act corpus; consequendy, each speech act was predicted with some likelihood that varied from 0 to 
1. We tested a weighted RTN because it provided a closer fit to the data. This was accomplished by an 
optimization procedure that determined the best-fit set of weights which maximized the GOP score. A 
speech act was score 1 as predicted if it met or exceeded a su^ength threshold. 

Schegloff and Sacks' adjacency network . This was an RTN that captured the adjacency pair analysis of 
Schegloff and Sacks (1973). Therefore, only one speech act of context would be considered when 
predicting speech act N+1 , and the speaker of N was always a different speaker than the speaker of N+1. 
The speech act categories of Schegloff and Sacks were translated into those categories in Table 8. 

Performance of models in predicting speech act categories 

Tabic 9 presents performance data on the four connectionist models of speech act prediction. Goodness- 
of- prediction (GO?, scores are listed for each model and corpus. Table 9 also includes the hit rate, 
baseraic, and mean number of speech acts predicted by the recurrent connectionist network. It was 
possible to perform statistical analyses on the simulations of the conneciionist networks by having a 
different set of random starting weights in the weight space and running the simulation 10 times. As a 
crude, but conservative estimate, a GOP score difference of .010 is significant (ji < .05). 

Maximum activation GOP scores were available for the four connectionist models. The predicted speech 
act for a model was the one speech act that had the highest activation 6core. The recurrent connectionist 
network was the best network according to this performance measure. When averaging over the three 
corpora, the GOP scores wore .337, .317, .290 and .290 for the recurrent network, the double-entry 
huckpropagation network, the single-entry backpropagation network, and the perceptron. A very similar 
pattern of scores emerged for the above threshold GOP scores, where more than one speech act was 
predicted: .439, .442, .326, and .?28, respectively. In this case, however, there was no dilTercnce 
between the reeuirent network and the double-entry backpropagation network. These results are 
consistent with the following conclusions. 
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1. The recumnl conneclionisl neiwork correctly predicts the next speech act 34-44% of the lime 
(al'tcr controlling lor ba.scrale guessing). 

2. The average number of predicted speech act categories is 1.7. 

3. Only 2 (or possibly 3) speech acis of context are effective in formulating successful 
predictions of the next speech act category. (This was further substantiated in follow-up analyses 
of the recurrent network that plotted GOP scores as a function of the number of context items 
available). 

4. Two speech acts of context ai"e much better than one. 

The third conclusion suggests that it is futile for speakers to plan several speech acts into the future. 
S[>eakers are constantly replanning, re-evaluating, and revising the conversation in ihe face of constantly 
changing situational constraints (Clark & Schaefer, 1989; McAnhur el al., 1990; Winograd & Flores, 
1986). Speaker A's next speech act category appears to be formulated on the basis of speaker A's last 
speech act together with speaker B's last speech act. The context prior to this is not very useful for 
formulating predictions. A global, lop-down, expeciation-driven model of conversation would have 
problems explaining our results. 

The performance on the recunreni conneclionisl neiwork was compared to ihe two recursive transition 
networks. In order to compare each RTN neiwork with ihe recurrent conneclionisl neiwork, we computed 
a model comparison ratio > which is specified in formula 3. 

Ralio = GOP (RTN I S speech acts predicted) / GOP (recurreni I S speech acts predicted) (3) 

The OOP score of ihe recurrent neiwork was yoked to ihe GOP score of ihe RTN network so that boiii 
models predicted the same number of speech acts at N+1 (on the average). A model comparison ratio 
score of 1 means that the two models perform the same. A ratio of less than 1 means the recurrent 
network performs best, whereas a ratio of greater than 1 means ihat the RTN performs best. 

The recurrent conneciionist network performed better than the two RTN's. The maximum values of ihe 
model comparison raiios were determined over varying values of S (i.e., nu«nber of predicted speech acts, 
which vary with the threshold vali:e). For Graesser's RTN, uyt maximum values were .89, .43, and .50 
in the children's dyad corpus, ihe college tutoring corpus, and the telephone corpus, respectively. The 
mean number of predicted speech acts at a step were 6.6, 2.9, and 3.7, respectively. Therefore, on ihe 
average, 61% of iJhe sysiemaiicily ihai was picked up by the recunreni conneclionisl neiwork was also 
captured by Graesser's RTN. The performance of ihe Schegloffand Sacks RTN was much worse. The 
maximum model comparison ratios were .53, .29. and .12, respectively, so this second RTN captured 
only 31% of ihe sysiemaiicily of ihe recurrent conneclionisl neiwork. In ihis case, the mean numbers of 
predicted speech acts at a step were 2.7, 2.8, and 2.9, respectively. The fact that the adjacency RTN 
performed much more pooriy than the Graesscr RTN suppoits conclusion 4 (i.e.* two speech acts of 
contexi are quite a bit better than one), 

Viewed from anoiher perspective, il could be argued thai Graesser's RTN did an impressive job in 
capOiring the sysiemaiicity of the speech aci sequencing. We might view the recurrent conneclionisl model 
as a statistical upperbound in capluring ihe sequential sysiemaiicily in dialogue patterns (when considering 
only speech acl categories, not the conienl of ihe speech acts). Graesser's RTN captures 61% of the 
upperbound in sysiemaiicily. This is perhaps an impressive figure. 

Additional analvses 

Follow-up analyses were performed in order to answer some additional questions about the dialogue 
patiems. We analyzed ihe children's dyad data to assess wheiher GOP scores varied as a function of lype 
of task, age, and lype of relationship. These analyses revealed that the type of task had a robust impact on 
GOP scores. The maximum activation GOP scores were ,38, .07, and . 1 8 in the question task, ihe puzzle 
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* task, and fi'ee play* respeciively. The children apparently engaged in parallel monologues in the puzzle 
lask, whereas the 20-questions game placed substantial consirairiis on the dialogues. In contrast, the ago 
of the children and the type of social relationship (i.e,, mutual friends, unilateral friends, versus 
acquaintances) had absolutely no impact on the GOP scores. 
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Correlations Between Student Achievement and Properties of Student Questions and 

Answers 



Measures of Student Questions and Answers Achievement Measure 

Examination Scores Final Grade 
Total number of student questions -.22 -.34*** 

Proportion of student questions that are .1 5 .32 

knowledge deficit questions 

FYoportion of student questions that are .44* .58* 

deep-reasoning questions 

Proportion of students' answer contributions 
that are: 

Completely correct .32** .43* 

Partially correct .09 -.09 

Vague or no answer -.30 -.46* 

Error-ridden -.32** -.10 

Error-ridden, vague, or no answer -.52* -.49* 

Proportion of Yes answers (by student) to .07 .05 

comprehension-gauging questions 
(by tutor) 

Proportion of No answers (by student) to .42* .20 

comprehension-gauging questions 
(by tutor) 



* e < .05, two-tailed 
**e<. 06, one-tailed 
**C< .10, two-tailed 



Table 2 

Mechanisms that GL'rn:rai(' Tutor Oiicstions. 



CORPUS 



Research 

MECHANISMS ML-ihods 


Algebra 




Cuiriculum icripi .70 


•93 




Driven by student error .05 


.06 




Elaboration of an idea . 1 9 


.03 




Siimmiu'y-rccap . 14 


.01 




Get student to justify something, explain 
somelhtng. or generate an example .14 


.01 




Other .03 


.00 




Table 3 






Continuations After Tutor Question Is Answered. 








CORPUS 




Research 
Methods 


Algebra 


Activity or question guided by tutor s curriculum script 


.67 


.79 


Tutor diagnosis, dissects, or remodiaics student en'or^; 


,02 


.04 


Elaboration ol'an idea 


. tete 


.03 


Summary - recap 


.15 


.06 


Tutor prompts student to introduce next topic or example 


.05 


.00 


Student initiates next topic or example 


.05 


.10 


Other 


.05 


.01 



30 



uraesser ju 



Table 4 



Tvilor Fe^-.rfh:H!k as n Function of Oualitv ofSludent Coniribu lions. 












Oualitv of Student Answer 




Motusure 


Corpus 


Error- 
ridden 


None or 
Vague 


Partially 
Correct 


Comnle 
Con-e( 


Number of observations 


Research 
Methods 


48 


56 


131 






Algebra 


47 






25 


Proportion of observations 


Research 
Methods 


.13 


.15 


.36 


.36 




Algebra 


,24 


,07 


.56 


.13 


Positive Feedback 












Short feedback 


Research 
Methods 


.31 


,40 


,47 


.56 


Long or short feedback 


Research 
Methods 


.31 


.45 


.50 


.63 


Short feedback 


Algebra 


.30 


.23 


.65 


• OLr 


Long or short feedback 


Algebra 


.30 


.31 


,73 


.92 


Negative feedback 












Short feedback 


Research 
Methods 


,10 


,00 


.01 


.00 


Long or short 


Research 
Methods 


.12 


,04 


.03 


.00 


Long, short, or corrective 


Research 
Methods 


.40 


.12 


.07 


.04 


Short feedback 


Algebra 


,36 


.15 


10 

.IV/ 


.00 


Long or short 


Algebra 


.36 


.15 


.11 


.00 


Long, short, or Corrective 


Algebra 


.83 


.23 


.17 




Neutral or No Iccdback 














College 


,31 


.50 


.44 


.33 




Algebra 


.11 


,54 


.12 


.08 



3.1 



Table 5 

Analysis of Student Eirors ManitLMcd in the Sa mple of Tutor Questions. 

COR PUS 

Research 



Mclhods Algebra 

Number of eirors in sample 48 47 
Type of error 

Slip ,16 .13 

Bugorgliich ,25 ,23 

Deep misconeeplion .59 .63 

Tutor's treatment of error 

Error is acknowledged in short or long feedback .12 .36 

Tutor splices in coixect answer .40 ,36 

Tutor supplies a hint ,10 .45 

Tutor reasons to expose derivadon of correct answer .17 ,34 

Tutor asks student question to extract comsct answer .17 ,21 

Tutor issues dii-ectivc to extract correct answer .04 .06 

Likelihood of fhe student catching his/her own error .00 .04 



Table 6 

Contrihuiion Transition Matrix: Status of Contribution on Turn N-h \ . Given the 
Cumulative Quality of t) ie Answer During Turns 1 to N. 



TUTOR CONTRIBUTION 

Research methods corpus Algebra corpus 

TumN+1 TumN+1 



Turn N 


£ 




PC 


CC 


E 




PC 


CC 


CC 


.00 


.59 


.24 


.17 




.00 


.26 


.60 


.14 


PC 


.00 


.46 


.39 


.14 




.00 


.29 


.62 


.08 




.00 


.56 


..^2 


.12 




.00 


.28 


.69 


.03 


£ 


.06 


.21 


.44 


.27 




.00 


.10 


.78 


.12 



STUDENT CONTRffiUTlON 
Research methods corpus Algebra corpus 

Turn N+1 Turn N+1 



Turn N 


E 


NA' 


PC 


CC 


E 


N/V 


PC 


CC 


CC 


.01 


.76 


.19 


.04 




.04 


.68 


,24 


.03 


PC 


.08 


.54 


.25 


.14 




,12 


.48 


,34 


,06 


N/V 


.09 


.33 


.21 


.37 




.21 


.27 


.38 


.14 



CC = Completely correct answer 

PC = Partially correct answer 

NA^ = Nothing or vague answer 

E = Error-ridden answer 
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Table 9 



Pcrlbrmance of Four Conrn^ctionist Models of Speech Aci Prediction. 



CORPUS 



Maximum Activation Analysis 
Goodness-of-prediction Score 
Recurrent connectionist network 
Double-cniry back propagation network 
Single-entry back propagation network 
Perceptron 

Hit rate (recurrent network} 
Base rate (recunieni network) 
Number of speech acts predicted 



Children's 
Dyads 



.289 
.292 

.26:-; 

.268 

.379 
.122 
1 



College 
Tutoring 



.264 
.330 
.311 

.331 



.451 
.136 
I 



Telephone 
Conversations 



.358 
.330 
.291 
.291 



.472 
.178 
1 



Above 'ihreshold Analysis 

Goodness-of-prediction Score 
Recurrent connectionist network 
Double-entry back propagation network 
Single-entry back propagation network 
Percepiron 

Hit rate (recurrent network) 
Base rate (recurrent network) 
Number of speech acts predicted 



.376 .420 .520 

,367 .420 .540 

.322 .364 .292 

.320 .371 .292 

.565 .560 .696 

.309 .242 .366 

1.8 1.5 1.9 



Figung 1: The 5-SteD Dialog Frame 



STEP 1: TUTOR ASKS QUESTION 



IF the luLor connol understand ihe question or the question is noi posed 
lui iniended, THEN the tutor asks a revised question. 

IF the student does not understand ihe question. THEN the student asks 
a counler-clarincation question. 

1 ^ 

STEP 2: STUDENT ANSWERS QUESTION 



The tutor sometimes pumps the student for more answer information 
by a neutral response (e.g.» "uh-huh"). 

I 

STEP 3: TUTOR GIVES SHORT FEEDBACK 



The tutor's feedback is positive, negative, or neutral. 

The feedback is linguistic or paralinguislic (e.g„ head nod). 

Intonation is important. 



% 

STEP 4; TUTOR IMPROVES QUALITY OF ANSWER 

The tutor splices in a complete or partial iinswer. 
The tutor summarizes answer. 
The tutor gives hint. 

The tutor traces explanation or justification. 

The tutor elaborates on answer. 

The tutor asks question to elaborate on answer. 

The tutor presents an example. 

The tutor corrects a misconception. 

The tutor issues a command or indirect request for student to 
complete an activity, 

I 

STEP 5: TUTOR ASSESSES STUDENTS UNDERSTANDING 

The tutor asks whether the student understands. 

The tutor asks a simple question. 

The tutor asks a complex question. 

The tutor requests the student to solve a similiar problem. 
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